Goto

Collaborating Authors

 seldonian reinforcement learning algorithm


Review for NeurIPS paper: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

Neural Information Processing Systems

Weaknesses: W1: The study seems to focus too much on algorithms that are based on safety tests. I understand that the analysis is not compatible, but maybe that would be worth it to include studies on how easy it is to trick those algorithms too. More generally (even for IS algorithms), it was a bit odd to me that the study does not consider attacks on the way pi_e is chosen. W2: It's unclear to me whether the trajectory must still have been performed in the real environment, or it can be completely be made up (but then its value has to be within the range [0,1]). Also, with model based methods (for both environment and policy models), it might be possible to single out the few trajectories that are inconsistent with the other trajectories.


Review for NeurIPS paper: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

Neural Information Processing Systems

All the reviewers support acceptance for the contributions, notably improvements to the robustness of RL algorithms to adversarial attacks, and a clear exposition on how these methods can be applied to real world problems. Please consider revising the paper to address the concerns raised in the reviews and rebuttal, in particular to better explain the scope of the work. Separately, it may be useful to extend the broader impact statement to inform a casual reader that a mathematical safety guarantee on an algorithm is not a replacement for domain specific safety requirements (for example, the diabetes treatment would still need oversight for medical safety).


Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

Neural Information Processing Systems

We analyze the extent to which existing methods rely on accurate training data for a specific class of reinforcement learning (RL) algorithms, known as Safe and Seldonian RL. We introduce a new measure of security to quantify the susceptibility to perturbations in training data by creating an attacker model that represents a worst-case analysis, and show that a couple of Seldonian RL methods are extremely sensitive to even a few data corruptions. We then introduce a new algorithm that is more robust against data corruptions, and demonstrate its usage in practice on some RL problems, including a grid-world and a diabetes treatment simulation.